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Abstract — In this paper, a novel low-complexity adaptive de- 
cision feedback detection with parallel decision feedback and 
constellation constraints (P-DFCC) is proposed for multiuser 
MIMO systems. We propose a constrained constellation map 
which introduces a number of selected points served as the 
feedback candidates for interference cancellation. By introducing 
a reliability checking, a higher degree of freedom is introduced 
to refine the unreliable estimates. The P-DFCC is followed by 
an adaptive receive filter to estimate the transmitted symbol. 
In order to reduce the complexity of computing the filters with 
time- varying MIMO channels, an adaptive recursive least squares 
(RLS) algorithm is employed in the proposed P-DFCC scheme. 
An iterative detection and decoding (Turbo) scheme is considered 
with the proposed P-DFCC algorithm. Simulations show that 
the proposed technique has a complexity comparable to the 
conventional parallel decision feedback detector while it obtains 
a performance close to the maximum likelihood detector at a 
low to medium SNR range. 

Index Terms — RLS, multiuser detection, MIMO, adaptive re- 
ceivers, iterative (Turbo) processing. 



I. Introduction 

MULTI-user detection (MUD) {Q algorithms have shown 
that they can be applied to the uplink of 3G and next 
generation multi-antenna communication systems. MUD can 
also be applied to spatially multiplexed multi-input multi- 
output (MIMO) wireless communication systems to form a 
spatial division multiple access (SDMA) scheme. In such 
systems, multiple users are operated within the same frequency 
band simultaneously and the spatial dimension is exploited 
which can significantly increase the bandwidth efficiency. In 
order to successfully restore the signals from the received 
signal combination, pre-coding 0, and decoding RJ, |3) 
techniques are developed at the transmitter side and receiver 
side respectively. Due to the fact that for a multiple access 
uplink scenario, it is difficult for each user equipment (UE) 
to know the channel state information (CSI) of others, in this 
paper, we focus on the decoding part. 

Several detection techniques have been developed for use at 
the receiver to suppress the multi-access interference (MAI), 
recover the simultaneously transmitted signals and increase 
the throughput for the served UEs [4] . The optimal maximum 
likelihood detection (MLD) (T) scheme has exponential com- 
plexity with the number of data streams and the modulation 

Part of this paper was presented at ICASSP2012. Peng Li, Jingjing 
Liu and Rodrigo C. de. Lamare are with the Department of Elec- 
tronics, The University of York, England, UK, YO10 5DD e-mail: 
(pl534,rcdl500,jl622)@ohm.york.ac.uk. 



level, which is impractical for systems even with a moderate 
number of UEs. The cost effective ML solutions such as sphere 
decoders (SD) J6] Q approach the optimal performance with 
reduced complexity (|8 |, however, they still have a lower bound 
complexity which is polynomial or exponential depending on 
the number of UEs as well as the signal-to-noise ratio (SNR) 
QO ■ In order to avoid the high complexity of ML or near- 
ML detectors, linear detectors which are based on minimum 
mean square error (MMSE) or zero-forcing receiver filters 
have been investigated. Generally, linear detectors experience 
a performance loss and achieve a lower capacity. A decision 
feedback receiver with successive decision feedback (S-DF) 
iflOl or with parallel decision feedback (P-DF) 02) can be 
employed to achieve a higher capacity. These DF receiver 
structures iTPJl . lH31 . llT6l .Pl are preferred as they offer an 
attractive performance and complexity trade-off, which is 
usually a key concern in multiple access systems. 

The S/P-DF architectures are able to provide high spectral 
efficiencies when multiple transmit antennas are deployed 
J3). However, the application to systems with time-varying 
channels is difficult due to the excessive computational load 
for updating the receive filter coefficients and tracking the 
channel fPTll . The estimation of the receive filter weights and 
the CSI requires matrix inversions and other operations that 
lead to a significant number of computations. 

As an alternative, the training aided adaptive techniques may 
be deployed for multiuser systems in time-varying channels 
|[T8ll . Adaptive algorithms can be used to track the channels 
and to avoid excessive computations when the channels are 
varying. In lH"8l . the authors developed a low-complexity 
data-aided adaptive technique for detecting the time-varying 
channels based on the GDF [ 1 3 1 structure, the weight vectors 
are updated using the recursive least squares (1RLS) based 
algorithm. The multiple access interference introduced by 
spatial multiplexing can be suppressed in a serial or parallel 
manner and the transmitted symbols are estimated at each 
stage. Despite its many benefits, there is a large performance 
loss when one compares the performance of a DF based 
receiver with that of the optimal detector. This is due to the 
fact that (1) the DF structure can not provide the full receive 
diversity order achieved by the optimal MLD in spatially 
multiplexed systems. (2) The average performance of S/P- 
DF is dominated by the data stream with the lowest SINR 
and the effect of error propagation is inevitable [fl4| . (3) 
With the adaptive solution, the receiver filter is directed by 
the decisions made in the previous time instance. Therefore, 
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erroneous decisions lead to unreliable filter weights. 

To address these problems, an adaptive multiuser decision 
feedback solution is proposed for time-varying multiple access 
MIMO channels. The so-called adaptive decision feedback 
detection with parallel interference cancellation and constella- 
tion constraints (P-DFCC) algorithm proposes a constrained 
constellation map which introduces a list which serves as 
the feedback candidates for P-DF detection. By calculating 
the bit and symbol reliability, a higher degree of freedom is 
introduced to refine the unreliable estimates in the cancellation 
stage. The proposed algorithm is able to significantly improve 
the performance for a traditional adaptive S-DF or P-DF detec- 
tor and close the gap from the MLD. Thanks to the reliability 
calculation, the proposed algorithm obtains the combination 
list at a small additional computational cost. 

We also consider a spatially multiplexed multiuser iterative 
detection and decoding (IDD) scheme incorporated with the 
proposed structure. In this coded system, the soft-input soft- 
output (SISO) detector is required to produce soft-decision 
values in terms of log-likelihood ratio (LLR). The proposed 
SISO detector uses the produced combination list to com- 
pute the likelihood of each transmitted bit, the probability 
of the decision is conveyed. This SISO detector is further 
concatenated with a SISO channel decoder to form a turbo 
structure which allows a lower SNR requirement for the 
adaptive MUD receiver. Computer simulations indicate that 
the proposed P-DFCC algorithm significantly outperforms 
the conventional S/P-DF schemes (i.e. Ifl8l ) and approaches 
the optimal performance with very low additional detection 
complexity. 

The main contributions of this paper are: 

• An adaptive decision feedback based algorithm is devel- 
oped for data detection in time-varying MIMO channels. 

• A P-DF receiver structure is investigated with the adap- 
tive scheme, the constellation constraints (CC) is incor- 
porated in the receiver to enhance the performance of 
interference cancellation. 

• The error performance and the detection complexity of 
the proposed algorithm are compared with several popular 
existing S/P-DF and optimal detection schemes. 

• A SISO detector is developed as a component of a 
multiuser IDD receiver structure. 

The organization of this paper is as follows. Section II gives 
the multiuser spatial multiplexing MIMO system model as 
well as the conventional S/P-DF detector and optimal detec- 
tion criterion. The proposed P-DFCC and its implementation 
are described in section III and followed by a complexity 
comparison in section IV. The iterative detection and decoding 
structure is introduced in Section V. The simulation results are 
given in Section VI and Section VII concludes the paper. 

II. System and Data model 

Let us consider a model of an uplink MU-MIMO system 
with K UEs and an access point (AP). Each UE is equipped 
with a single antenna. At the receiver of the AP, Nr receive 
antennas are available for collecting and processing the sig- 
nals. Throughout this paper, the complex baseband notation is 




Fig. 1. Spatially multiplexed multiple access system. We assume the 
transmitted signal from K UEs are spatially uncorrected and K < Nr. 

used while vectors and matrices are written in lower-case and 
upper-case boldface, respectively. We assume that the signals 
of the UEs are perfectly synchronized at the AP, at each time 
instant [i] K users simultaneously transmit K symbols which 
are organized into a vector s [i] = \sx[i], S2[i], , 
where (-) T denotes the transpose operation, and whose entries 
are chosen from a complex C-ary constellation set X = 
{ai, a.2, ■ • ■ , ac}- The symbol vector s[i] is transmitted over 
time-varying channels and the received signal is processed by 
the receiver at the AP with Nr spatially uncorrected antennas. 
The received signal is collected to form an Nr x 1 vector with 
sufficient statistics for detection 

K 

r[i] = ^2,h h [i]s k [i\ + v[i] 

fe=i ^ ' 

= H[i]s[i\+v[i], 

where the Nr x 1 vector v[i] represents a zero mean complex 
circular symmetric Gaussian noise with covariance matrix 
i?[t)[i]t) ff [i]] = a 2 1, a 2 is the noise variance and J is the 
identity matrix, E[] stands for the expected value and (-) H de- 
notes the Hermitian operator. The symbol vector s[i] has zero 
mean and a covariance matrix i?[s[i]s H [i]] = a 2 1, where 
a 2 is the signal power for all transmitting UEs. Furthermore, 
the elements of H[i] are the time- varying complex channel 
gains from the fc-th UE to the nR-th receive antenna, which 
follow the Jakes' model 1201 . The Nr x 1 vector hk [i] includes 
the channel coefficients of user k such that H[i] is formed 
by the channel vectors of all users. As the optimal SINR- 
based nulling and cancellation order (NCO) lfl8l requires a 
high computational complexity, we determine the NCO by 
computing the norms of the column vectors corresponding to 
all users and we detect them in decreasing order of their norms. 

A. Optimal Detection 

The optimal ML detection algorithm tries all the possible 
transmitted signal vectors with the given channel H, the 
detector computes the Euclidean distance by J^s^uciidean = 
||r — Hs\\ 2 , the signal vector with the minimum Euclidean 



DRAFT PAPER FOR IET COMMUNICATIONS, APRIL 2012 



3 



distance is determined as the estimate of the transmitted signal: The input can also be concatenated as 



sml = arg max P(r\s) 

s£X K 



1 



arg max 



2\K 



\\r-Hs\\\ 
exp( = ) 



(2) 



arg min l /(s) E uciidean, 
sex K 



Similarly to MAP detection, the algorithm requires an exhaus- 
tive search of \X\ K equations in (f2]i . The high complexity of 
the metric calculation prevents the actual application of these 
detectors in the real world, except for very small systems and 
constellations. 

B. Successive and Parallel DF Receivers 

Let s[i] = [si[i], sa [*])••• 5 sjf[i]] T represent the detected 
symbol vector. The soft symbol estimates Uk[i] are obtained 
by calculating the difference between the output of the forward 
receive filter and the output of backward receive filter as 
described in |fl8l and given by 



u k [i\ = uf ik [t\r[i] - u% k [i]d k [t\, (3) 

where the column vector u>f,k[i] E C JVnXl denotes the 
forward receive filter. The column vector a;^^] indicates 
a backward receive filter with the dimension k — 1 for 
successive decision feedback (S-DF) detection, or K — 1 for 
parallel decision feedback (P-DF) detection. 

1) S-DF: S-DF detection is illustrated in FigfSJa), where 
the backward receive filter w&fc[i] E C 1 has k weight 
elements, and the size of Wfc,fc[£] increases as h raises. The 
forward filters w/ fc[i] act as the nulling vectors of the V- 
BLAST algorithm. Then for each data stream k = 1, . . . ,K, 
the decisions are accumulated and cancelled by the (fc — 1)- 
dimensional filter W& The backward receive filter is 
initialized by u^i = for the first user. For the following 
users, the (k — 1) -dimensional detected symbol vector is 
obtained as 



\i\d. 



d k [i\ 



\Sl,S2, • • • , Sfc-1 



(4) 



The S-DF detection can provide a diversity order of Nr — 
K + k for each user k assuming that perfect interference 
cancellation is performed by the receiver. 

2) P-DF: By assuming perfect interference cancellation, P- 
DF is able to provide a higher diversity order compared to the 
S-DF based detection algorithms. Similar to S-DF scheme, 
the P-DF first processes the received signal r [i] by the forward 
receive filter E C NrX1 . However, as shown in Fig|2fb), 

the backward receive filter is different from S-DF, which 
is given as W6,fe[i] E C^ - 1 ) xl , and the decision feedback 
symbol vector is defined as 

dk[i] = [si, • ■ • , Sfe_i,0, Sk+i ■ • • , s K ] T ■ (5) 

where the decisions for user 3k = Q{uk[i]} are obtained by 
applying a slicer represented by Q{ }. 

For notational convenience, the forward and backward filters 
can be concatenated together as IfTSl 



U) k [l\ 



w/,k[i], k = 1 



(6) 



r[i], k = l 

jTr-.lT 



[r T \i],dl{i}Y , k = 2,...,K. 
Then, we can rewrite the soft estimates (O as 



u k h\ 



wf \i]f k [i\- 



(7) 



(8) 



The forward and backward filters can be jointly optimized 
by using an MMSE criterion or solving a lest squares prob- 
lem. For the sake of computational complexity, in the pro- 
posed structure the recursive least squares (RLS) algorithm is 
adopted for the design of the forward and backward filters. 
It should be noted that other advanced parameter estimation 
algorithms such as reduced-rank techniques lfl2l . rf2"TI can also 
be used. 

III. Adaptive P-DF with Constellation 
Constraints 

A. Computation of P-DF filters 

As a result, the structure and the signal processing model 
of the proposed DF detector are depicted in Figj3] We denote 
the receive filter of each user as CJk[i} (k = 1, . . . , K), and the 
value of each entry can be obtained by solving the standard 
least squares (LS) problem. The LS cost function with an 
exponential window is given by 



SfcM -wf [i]f k [T 



(9) 



where < A < 1 is the forgetting factor, the scalar 5fc[r] 
denotes the detected signal in the time index r or the known 
pilots where Sk[r] = Sk[r]. The optimal tap weight minimizing 
J k [i] is given by 



u>k W 



mPkW 



(10) 



where the time-averaged cross correlation matrix is obtained 
by = Et=i^~ T ^Mrf[T] and * fe [0] = 0, the 

time-averaged cross correlation vector is defined by p k [i] = 
Et=i^- T ?fc[r]S;[r]. 

Using the recursive least squares (RLS) algorithm lfl"9l . 
the optimal weights in ( [Tol l can be calculated recursively as 
follows: 

q k \i} = * k - 1 [i-l]r k \i}, (ID 



kk[i] 



A 



L QkW 



* fe " 1 W = A- 1 



Wfe \i\ 



U) k \l 



kk\i]qf\i], 

i]+fcfcWm 



n _ A -i 



(12) 

(13) 
(14) 



where 




.~.H\ 



l]ffc[i], Training Mode, 
l]ffc[i], Decision-directed Mode. 

(15) 

As indicated in ( fl5] l, this adaptive detection algorithm works 
in two modes. The first one is employed with the training 
sequence, while the second one is the decision-directed mode 
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(a) S-DF 



(b) P-DF 



Fig. 2. Block diagram of the conventional (a) S-DF scheme and (b) P-DF scheme. An RLS algorithm is employed to iteratively obtain the filter weights. 



that is switched on after the filter weights converge. In the 
decision-directed mode, the mean square error (MSE) of the 
estimated symbols has a major impact on the performance of 
adaptive DF algorithms. This is because the detection error 
of the current user may propagate throughout the detection of 
the following users. Moreover, in time-varying channels a poor 
£k[i] can easily damage the u>k[i] in equation ( TT4T > resulting in 
burst errors. 

B. P-DF with Constellation Constraints 

In order to address this problem, the proposed P-DF 
with constellation constraints (P-DFCC) structure introduces 
a number of selected constellation points as the candidate 
decisions when the filter output Uk[i] is determined unreliable. 
After the system is switched to the decision-directed mode, the 
concatenated filter output Uk[i] is checked by the CC device 
which is illustrated in Fig. [3] The CC structure is defined 
by the threshold distance d±, which can be a constant or 
determined in terms of SNR. The reliability of the estimated 
symbol is determined by the Euclidean distance between the 
symbol estimates and its nearest constellation points, which 
are given by 

d k — min {\u k [i] - a c |}, (16) 

where a c denotes the constellation point which is the nearest 
to the soft estimation Uk [i] of the fc-th symbol. The CC device 
distinguishes the reliable estimation from the unreliable ones, 
which allows the P-DFCC to avoid redundant processing with 
reliable feedbacks and maintain the complexity at the same 
level of the conventional P-DF structure. The following is 
devoted to describe the detection of Sk[i) for the fc-th user. 

Let us define two regions for the QPSK constellation map: 
(1) The region inside the square obtained by connecting 
four a r , the a r are assumed to have the form, a r = I ± 




Fig. 3. Block diagram of the proposed P-DFCC multi-user detector. There 
are K — 1 interference symbols for each user's backward filter. 



constellation symbols. The estimate Uk[i] is considered inside 
the square if the following equations hold 



|»{«fc[*]} | <e/2 
<e/2. 



(17) 



where 5f{-} and 3{ } denote the real part and the imaginary 
part of a complex-valued quantity, respectively. 

(2) Otherwise, the estimate is in the region outside the 
square obtained. 

1 ) CASE 1 inside the square: In the first case, the estimate 
Uk [i] is considered as unreliable if the following equation holds 



dk > d ti 



(18) 



e/2, ±(e/2)j ), where e is the distance between two nearest 



where dk denotes the distance between the estimated symbol 
Uk [i] and its nearest constellation point and a c is each element 
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Fig. 4. The constellation constraints (CC) device. The CC procedure is 
invoked as the soft estimates [i] dropped into the shaded area. Parameter 
e denotes the distance between 2 nearest constellation symbols. 



of the constellation points. Otherwise, the estimated symbol is 
closer to the constellations and the decision is considered as 
reliable. 

2) CASE 2 outside the square: In this case, the equations of 
( fl7l i do not hold, where the estimated symbol Uk [i] is outside 
the square. In this case, the decision is determined unreliable if 
the distance from Uk[i] to I(In-phase)-axis and Q(quadrature)- 
axis is small. Therefore, the estimate is unreliable if any of 
the following equations holds 



I »{«*[«]} I <e/2-ck 



\$t{u k \i}}\ <e/2-dfl 



(19) 



(20) 



Otherwise, the estimated symbol is far away from the axis 
borders and the estimate is considered as reliable. 

This can be further extended to multi-tier constellations 
(eg.l6-QAM) where the outer-tier would be similar to CASE 
2, but we should also include two additional equations in 
addition to ( fl9l ) and d20l ) which are given as 



min |3?{u fc [i]} ± e| < e/2 - d ti 
min ± e < e/2 — d$ 



(21) 
(22) 



where |5ft{w,fe[i]} ± e| are the distances between Uk[i] and 
two vertical lines across the points (0, ±e), respectively. The 
matrices |3{it/c[«]} ± e| are similarly defined as the distances 
between Uk[i] and two horizontal lines across points (±e,0), 
respectively. Therefore, for 16-QAM constellations, the es- 
timate is considered as unreliable if any one of the four 
equations above dl9H22b holds. On the other hand, for the 
inner-tier constellations, if 



min \ a k [i] - u k [i] \ > d t i 



Vc, 



(23) 



was true, the estimate is considered as unreliable. The CC 
device distinguishes the reliable feedback signals from the 
unreliable ones, which allows the P-DFCC to maintain the 
complexity at the same level of the conventional DF structure. 



Reliable: If the filter output u k [i] is dropped into the lighted 
area of the constellation map, the decision is considered 
reliable. The tentative decision of s k [i] is obtained by 

C k [i] = arg Qc min | a c - u k [i] | (24) 

Unreliable: If it is the case that Uk[i] is determined unre- 
liable, we proceed by organizing the Euclidean distance ob- 
tained by ( fl6b in decreasing order, a list of tentative decisions 
of Sfc [i] is obtained as given by 

C k [i] = {c 1 ,c 2 ,...,c T } k , (25) 
where 1 < t < \X\, and 



de[ci] < de[c 2 ] < . . . ,de[c T ], 



(26) 



where de[ ] denotes the Euclidean distances between Uk and 
c T . 

Therefore, for each user we obtain a tentative decision list 
Ck- By listing all the combinations of the elements across 
K users, a length r tentative decision list is formed. Each 
column vector on the list denotes a possible transmission 
symbol vector s[ where I = The size of the list 

is obtained by 



K 



r = niA 



i < r < \x 



K 



(27) 



fc=i 



where | • | denotes cardinality. In order to obtain an improved 
performance, the maximum likelihood (ML) rule can be used 
to select the best among the r candidate symbol vectors. 
The cost function for the ML selection criterion, which is 
equivalent to the minimum Euclidean distance criterion and 
the selected vector is given by 

1 1 2 

s ml = ar S i= ™ in r \\r\i}-H Sl \i]\\ , (28) 

where s' ML is the ML selected vector, which can be used as 
the feedback symbols as well as the decision vector. 

The number of r could be considered as a reflection of the 
trade-off between complexity and performance. By assuming 
a large threshold d t h, the proposed scheme is able to tolerate 
a higher error energy and results in a smaller r but suffers 
from a performance loss. For an extreme case where = inf, 
the proposed detector is equivalent to a conventional D-DF 
detector. In contrast, if d± = 0, we have P = \X\ K which 
means the proposed receiver performs DF detection with an 
ML rule that allows the search for an ML solution for each 
user. It is also worth to mention that a maximum r max can 
be set to guarantee 1 < r <C \X\ K , which prevent high 
complexity in very low SNR range. 

By introducing a constellation constraint, (a) the detection 
diversity is directly related to the threshold d t h'- a decrease 
in the value of d t h could result in a longer list which may 
increase the diversity order, (b) In the low SNR region, it is 
also likely to obtain a longer list than that in a high SNR 
region, hence the diversity order tends to be higher. On the 
other hand, for the high SNR region, all the symbol estimates 
are considered reliable and the diversity order tends to be the 
same of a conventional P-DF (this is similar to increasing the 
threshold d t h)- This implies that the gain provided by P-DFCC 
is higher for a small to medium region of SNR. 
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C. Channel Estimation 

As we discussed in the previous sections, the MIMO chan- 
nel state information is required for the ML rule (|28} and 
for generating the cancellation ordering codebook for ordered 
processing [9|. The LS channel estimation algorithm has been 
investigated in |22l . Based on a weighted average of error 
squares, the estimated channel minimizes the cost function 
whose expression at time instant i is defined as 



T=l 



r t 



H\i]s[r] 



(29) 



where H[i] is the channel matrix estimate at time instant i. 
The quantities r[r] and s[r] are the received signal and the 
pilot symbol vectors at the time instant r, respectively. 

To minimise the cost function, the gradient of the cost 
function with regard to the estimated channel matrix should 
be equated to a zero matrix as 



(30) 



By solving the above equation, the LS estimate of the channel 
matrix is obtained as 

i i 

T=l T=l 

= D[i]* _1 [i]. 

(31) 

In order to avoid the matrix inversion operation a 
recursive algorithm is developed. Let us define 



where D[i] can be obtained iteratively by 
D[i] = XD[i - l] + r[i]s[i] 



H 



(32) 



(33) 



and P[i] is calculated iteratively by using the matrix inversion 
lemma, 

\- 2 P[i - l]s[i)s[i] H P[i - - 1 



P[i] = A _1 P[i - 1] 



1 + A-!s[ipP[i - l]s[i] 

of the pai 

and P[0] = 5~ l I, where 5 C is a small constant. 



(34) 



The initial state of the parameters are set as D[0] = 0n r .k 



IV. Iterative Detection and Decoding 

In the previous section, we have introduced the concept of 
constellation constraints and its implication for an uncoded 
multi-user detection algorithm. In order to reduce the SNR 
requirement for a MIMO receiver, error-control coding is 
essential for the system. Iterative detection and decoding 
(IDD) has been recognized as central technique for solving a 
large number of decoding and detection problems in wireless 
communications. In this section, the we are interested in 
developing IDD algorithms for spatially multiplexed multi- 
user data streams. 

For a multi-user MIMO IDD transmission system, the 
message is first encoded by an encoder, the coded bits are 
then interleaved and the coded bits are mapped to symbols 
before radiating from a transmitting antenna. At the receiver 



side, the P-DFCC detector is applied to detect the transmitted 
symbols and convert the symbol probability to bit probability 
in the form of LLRs. The extrinsic information L e (-) is then 
exchanged between the detector and the decoder with several 
iterations. The a posteriori probability of the transmitted bits 
are then finally obtained at the output of the decoder. 

On one hand, the encoder and decoder blocks are considered 
as the outer code of a serially concatenated structure, when 
a non-systematic convolutional coded (NSC) is applied, the 
BCJR (23 based MAP or log-MAP decoding algorithm can 
be applied as well as the lower complexity alternative named 
soft-output Viterbi algorithms (SOVA) 11241 . Instead of using 
a convolutional code as the channel code, turbo codes and 
LDPC l26l codes along with advanced decoding algorithms 
11271 can also be used in this structure to obtain a near-capacity 
performance Q 11281 . On the other hand, the mapping and 
MIMO detection blocks are considered as the inner component 
of the serially concatenated structure. In general, MAP is 
the optimal algorithm used as the SISO detection component 
in the IDD receiver. The MAP detector provide the optimal 
BER performance, however, the complexity is extreme. In 
order to solve this problem, a "list" version of SD was 
developed by Hochwald and ten Brink without significant loss 
of performance [7|. The complexity of the MIMO detection is 
further brought down by introducing soft parallel interference 
cancellation (SPIC) in l25ll . Il25l at the cost of a performance 
loss. In this section, we adapt the proposed P-DFCC detection 
algorithm into the IDD structure. 

In the coded systems, the model in (Q]i is used repeatedly 
to describe transmit streams of data bits which are separated 
into blocks. For a given block, the symbol vector s is obtained 
by mapping b = [61, bj, bj^j] coded bits. The quantity 
J is the number of bits per constellation symbol. For coded 
transmissions, the vector b is designated as the output of a 
forward error-correction code of rate R < 1 that introduces re- 
dundancy. The transmission rate is then RKJ bits perreceived 
vector. In the IDD processing, the detector makes decisions 
by using the knowledge of correlations across time instants 
— 0,1, ... ,1 provided by the channel decoder, and the 
channel decoder needs to decode the bit information by using 
the likelihood information on all blocks obtained from the soft 
output detector. 

For each user, a block of received signals r[i] is used 
to compute the a posteriori probability in the form of log- 
likelihood-ratios (LLRs), with P-DF, the MIMO input-output 
relation (fl~|i has been transformed in to K parallel data streams. 
By assuming these K streams are statistically independent, we 
may approximate the intrinsic a posteriori LLRs as l30l 



A?[&. 



3, km 



log 



p[b 3 . k [i\ = +ijufc[j]] 

P[b jtk \i] = -l\u k [i\] 



Vj.fc, (35) 



where the equation can be solved by using Bayes' theorem 
and we leave the details to the references 11251 . Q. We denote 
the intrinsic information provided by the decoder as [bj.k [i]] 
and the bit probability is obtained as 



P[6 i>fc [i]]=bg 



P[hk\i] 



Vj,k. 



(36) 
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From f25l , the bit-wise probability is obtained by 

cxp (bjA%[b jtk [i]] 



P[b j M = b i ] = 



exp (&,-Ag[&j )fc [i]] 
1 



(37) 



1 



bj tanh 



A%[bj,k[i 



where bj = {+1,-1}. Let us simplify the notation 
P[sfc[i]l := P[sfe[«] = Cg] where c q is an element chosen 
from the constellation X = {ci, . . . , c q , . . . , ca}- The symbol 
probability P[sfc[z]] is obtained from the corresponding bit- 
wise probability, and assuming the bits are statistically inde- 
pendent, we have 
J 

P[sk[i}]=l[P[b j ,k\i} = b j ], 

3=1 

J 



Algorithm 1 Algorithm soft-output log-Max-DFCC Detection 

Require: r e C NnXl , H <g C NrxK , constellation set A, a*, 
n^O, L{bf tj ), TI. 
1. Find the set of symbol vectors X^nCf and X^nCf 



2. for lo <- TI {Turbo Iteration} do 

3. for k<- 1,...,K do 

4. for j <- 1, . . . , J do 

5. for s e Xl 3 n Cf D do 

6. b <— demap(s), b^j <— 

7. P(x) <- §(26 [M - l)i(6g } ) 
probability} 



{Symbol 



(38) 



2 J 



3 = 1 



1 



6j tanh 



ASM*]] 



From d37l i and d38l we can conclude that X^ia'I -f[ s fe[*]] = 1- 
By organizing the probabilities obtained by (135) in decreasing 
order of values, a list of tentative decisions of Sk [i] is obtained 
in each stream as given by 

4 DD [z]^{ Cl ,c 2 ,..., CT } fe , (39) 

where 1 < t < \X\ and 

Pr[d) > Pr[c 2 ] >..., Pr[c T ], (40) 

and 

Pr[c q ] =P[s k [i\ =c q \u k }. (41) 

For the IDD coded structure we replace (l25t with (|39~t , thanks 
to the error correction, for a moderate SNR, the size of £j, DD is 
significantly smaller than that value in d39l . The pseudo-code 
for implementing the proposed P-DFCC with IDD structure is 
detailed in Algorithm. [TJ 



9. 
10. 
LI. 
12. 

13. 
14. 
15. 



A„ <- lnP{x) 
end for 

for s e Xgj n Cf D do 
b <— demap(s), bkj 
P(x) <- \{2b [ki - 
probability} 
A o^lnP(*)-^-f4l 

end for 

L(bi e f) max{A^,n 



{Symbol 



1. 



18 
19 
20 



max{A0,n=l,...,|Af° i |} 

16. end for {Antenna stream} 

17. end for {fi/f LafeeZ} 
Deinterleave extrinsic L(b^^) 
Perform BCJR decoding and compute 

Interleaving extrinsic L(b^') and feedback to detector. 

21. end for {Turbo Iteration} 

22. Decision of systematic bit is obtained via sign{L(mfe)} 



of MSE. From the figure we can see that the P-DFCC has the 
ability to track the fading channel with fdT = 10~ 3 . 
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P-DF 


V. Simulations 




P-DFCC 



In this section, several numerical examples are given to 
demonstrate the overall system performance of using our 
algorithms. In the following simulations, unless otherwise 
stated, we consider that the proposed algorithms and all their 
counterparts operate with a channel with independent and 
identically-distributed (i.i.d) block fading model. The channel 
model is of Rayleigh random fading and the coefficients are 
taken from complex Gaussian random variables with zero 
mean and unit variance. Other parameters are also assumed: 
QPSK is used; The transmitted vectors s[i] are grouped into 
frames consisting of 500 vectors where the first s[l], . . . , s[10] 
vectors are training vectors. In each frame, the channel be- 
tween a transmit and receive antenna pair is fixed and a single 
path is assumed. 

Fig. [5] demonstrates the MSE for the symbol estimation 
across all 8 user streams in terms of RLS iterations with 
8 receiver antennas configuration. E^/Nq = 20dB, and 
the normalized Doppler frequency fdT equals to 10~ 3 . The 
proposed P-DFCC scheme shows the improvement in terms 



10 



10 



10" 




100 



200 300 
RLS iterations 



400 



500 



Fig. 5. MSE of the estimated symbols in terms of RLS iterations, with 
8 users. After 10 training vectors transmitted, the decision -directed mode is 
switched on. The MSE is significantly reduced. 

The performance is also measured in terms of bit error rate 
(BER), obtained by 10 4 Monte Carlo runs. In our simulations, 
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the SNR per transmitted information bit is defined as 



N 



<JB 



= 10 log! 



N E 



R\og 2 C a\ 



(42) 



The total transmitted power E s = K ■ which is evenly 
distributed across K active users. The Nr receive antennas 
collect a total power of NrE s which carries K log 2 C coded 
bits or RK log 2 C information bits. R < 1 is the channel 
coding rate which introduces information redundancy. The 
coding rate R = 1 is assumed for the simulations without 
channel coding. 



10" 



10 



10 ' 



10 



10 



10 



10 



10 




— I P-DF 

- P-DFCC (d, h =0.5) 

_0_ P-DFCC (d th =0.05) 

-e— SD (ML) 



10 15 20 

E b /N [dB] 



25 



30 



Fig. 6. BER vs. Ei/Nq, the proposed P-DFCC detection achieves a near opti- 
mal performance in a 4 user system configuration. The constellation threshold 
dtjj introduces a trade off between the performance and the complexity. 

Fig|6] shows the BER against E^/No. The channel is esti- 
mated by LS algorithms, the P-DF-RLS detector (A = 0.998) 
proposed in lfl8l exhibits about 7dB performance loss when 
the target BER equals 10 -3 compared with the performance 
of SD. As for the SD, with a sufficiently large sphere radius 
selected, the SD can always produce an ML solution. With the 
constellation constraint threshold d t ^ = 0.05, the proposed P- 
DFCC-RLS (A = 0.998) algorithm shows a near-optimal BER 
performance at the target BER equal to 10~ 3 . From Figj6] we 
can verify that the optimal ML detector (or sphere decoder) 
is able to attain full diversity. On the other hand, similar 
to an MMSE-based successive interference cancellation (SIC) 
detection, the S-DF detector is able to obtain a diversity order 
of Nr — K + k and the BER performance is bounded by the 
user with the worst performance. 

It is also worth to mention that the diversity order of the 
traditional P-DF algorithm is usually lower than the channel 
power sorted S-DF (9), this is due to the problem of error 
propagation. In P-DF, an erroneous symbol would propagate 
through all other user's data stream. However, if all the 
detected symbols are highly reliable, P-DF may provide a 
higher diversity order than S-DF, this can be verified by 
assuming a perfect cancellation scenario, where P-DF achieves 
full receive diversity order while S-DF has only Nr — K + k. 



By introducing a reliability checking procedure, the diver- 
sity order of the proposed P-DFCC can be adjusted. The 
control of the diversity order is twofold: (1) the selection of 
c?th- From Fig|6]we can see that the diversity order is directly 
related to the threshold d t h- Namely, decrease the value of 
d t h could change the shape of the constellation constraint and 
increase the diversity order. (2) The received SNR region. In 
the low SNR region, the scheme is likely to list a higher 
number of candidates than those generated in a high SNR 
region and the performance approaches the ML detector. On 
the other hand, for the high SNR region, all the symbol 
estimates are considered reliable and the diversity order tends 
to be the same of a conventional P-DF. Therefore, for the 
proposed P-DFCC scheme the gain is higher for a small to 
medium region of SNR. 




15 

W^B] 

Fig. 7. BER vs. Ei/No, the proposed P-DFCC detection achieves a near 
optimal performance in a 4-user system configuration with 16-QAM symbols 

Another simulation is carried out with 16-QAM symbols. 
The SNR against BER curves are plotted in Figj7] The 
threshold is set to d th = 0.1. With QPSK modulation the 
proposed P-DFCC detection algorithm is able to achieve a 
better performance compared with traditional P-DF as well as 
S-DF algorithms. 

Fig JH] presents the comparison of BER performance for var- 
ious normalized Doppler frequency f^T (in the time-varying 
channels) when E^/Nq = 14 dB. In this simulation, each 
channel between a transmit and receive antenna pair varies 
accodrding to the Jakes' model l20l . LS channel estimation is 
applied to the unknown channel. The length of the training 
sequence is / = 20. The simulation results show that the 
proposed P-DFCC significantly improves the traditional P-DF 
detector and approaches the SD performance in time-varying 
channels. 

In Figj9j the complexity is given by counting the required 
complex multiplications as the number of users increases. P- 
DFCC has a complexity slightly above the P-DF while it 
achieves a significant performance improvement. The thresh- 
old cZth is introduced to reduce the complexity and improve the 
performance. We use fixed complexity sphere decoders (FSD) 
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Number of Users K 

Fig. 9. Complexity in terms of arithmetic operations against the transmit 
antennas, the P-DFCC has a comparable complexity with P-DF algorithm. 
d lh = 0.3. 



to compare the complexity. It should be noted that FSD 
is one of the lowest complexity SD algorithms that are known. 

The curves in FigJTU] are given for convolutionally coded 
BER performance on a Rayleigh block fading channel. The 
proposed P-DFCC with dth = 0.3 improves the conventional 
P-DF detection performance about 3 dB at the target coded 
BER equals to 10~ 4 . The P-DFCC detector approaches the 
optimal MAP detection performance with only 1.5 dB perfor- 
mance loss when coded BER = 10 -4 . 

VI. Conclusion 

In this paper, we have derived an adaptive decision feedback 
based detector for MIMO transmission systems with varying 
channels. In this context, we have presented a novel way to 
improve the BER performance by using the parallel decision 



feedback with constellation constraints approach, a threshold 
is introduced to reduce the complexity and improve the per- 
formance. This approach has the ability to reduce the MSE 
of traditional parallel decision feedback detection, effectively 
improve the BER performance of parallel interference cancel- 
lation schemes and obtain a close to optimal performance with 
a low additional detection complexity. 
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